AITopics | expectation-maximization contrastive learning

Collaborating Authors

expectation-maximization contrastive learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for " Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations "

Neural Information Processing SystemsApr-27-2026, 14:19:57 GMT

Potential negative societal impacts Although our work improves the performance of text-video retrieval, but may reduce the difficulty of cross-modal retrieval of sensitive information on the network. It may raise challenges to protecting information security. Limitations of our work Iterative approaches are sensitive to initialization and parameters such as the dimensions and the number of subspaces. In our work, although we use the L2 normalization operation to limit the value range of the parameters, the EM algorithm [3] may still converge to bad results. At the same time, the selection of the number of subspaces also has a relatively significant impact on the model effect.

artificial intelligence, machine learning, video, (15 more...)

Neural Information Processing Systems

Country:

Asia > China (0.16)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Industry: Information Technology > Security & Privacy (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Neural Information Processing SystemsDec-25-2025, 05:20:45 GMT

Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive experiments on three benchmark text-video retrieval datasets prove that our EMCL can learn more discriminative video-and-language representations than previous methods, and significantly outperform previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing approaches either as a jointly training layer or an out-of-the-box inference module with no extra training, making it easy to be incorporated into any existing methods.

compact video-and-language representation, expectation-maximization contrastive learning, video-and-language representation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Neural Information Processing SystemsAug-18-2025, 17:30:17 GMT

The core idea of contrastive learning is to pull the textual and visual representations of matched text-video pairs together and push the representations of unmatched text-video pairs apart.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: